TransRate: reference-free quality assessment of de novo transcriptome assemblies.
نویسندگان
چکیده
TransRate is a tool for reference-free quality assessment of de novo transcriptome assemblies. Using only the sequenced reads and the assembly as input, we show that multiple common artifacts of de novo transcriptome assembly can be readily detected. These include chimeras, structural errors, incomplete assembly, and base errors. TransRate evaluates these errors to produce a diagnostic quality score for each contig, and these contig scores are integrated to evaluate whole assemblies. Thus, TransRate can be used for de novo assembly filtering and optimization as well as comparison of assemblies generated using different methods from the same input reads. Applying the method to a data set of 155 published de novo transcriptome assemblies, we deconstruct the contribution that assembly method, read length, read quantity, and read quality make to the accuracy of de novo transcriptome assemblies and reveal that variance in the quality of the input data explains 43% of the variance in the quality of published de novo transcriptome assemblies. Because TransRate is reference-free, it is suitable for assessment of assemblies of all types of RNA, including assemblies of long noncoding RNA, rRNA, mRNA, and mixed RNA samples.
منابع مشابه
Title Corresponding Author
11 TransRate is a tool for reference-free quality assessment of de novo transcriptome assemblies. 12 Using only sequenced reads as the input, TransRate measures the quality of individual contigs 13 and whole assemblies, enabling assembly optimization and comparison. TransRate can 14 accurately evaluate assemblies of conserved and novel RNA molecules of any kind in any 15 species. We show that i...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملThe Oyster River Protocol: A Multi Assembler and Kmer Approach For de novo Transcriptome Assembly
1 Characterizing transcriptomes in non-model organisms has resulted in a massive increase in our 2 understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, 3 means that studies of functional, evolutionary and population genomics are now being done by hundreds or 4 even thousands of labs around the world. For many, these studies begin with a de novo...
متن کاملCompacting and correcting Trinity and Oases RNA-Seq de novo assemblies
BACKGROUND De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly us...
متن کاملA glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads
Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genome research
دوره 26 8 شماره
صفحات -
تاریخ انتشار 2016